Overview
Dataset statistics
| Number of variables | 16 |
|---|---|
| Number of observations | 10639 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 1.4 MiB |
| Average record size in memory | 136.0 B |
Variable types
| DateTime | 2 |
|---|---|
| Numeric | 9 |
| Categorical | 5 |
mta_tax has constant value "0.5" | Constant |
improvement_surcharge has constant value "0.3" | Constant |
RatecodeID is highly overall correlated with fare_amount and 1 other fields | High correlation |
duration_minutes is highly overall correlated with fare_amount and 2 other fields | High correlation |
fare_amount is highly overall correlated with RatecodeID and 3 other fields | High correlation |
tip_amount is highly overall correlated with total_amount | High correlation |
total_amount is highly overall correlated with duration_minutes and 3 other fields | High correlation |
trip_distance is highly overall correlated with RatecodeID and 3 other fields | High correlation |
RatecodeID is highly imbalanced (99.6%) | Imbalance |
payment_type is highly imbalanced (52.7%) | Imbalance |
duration_minutes is highly skewed (γ1 = 20.6986304) | Skewed |
tip_amount has 3598 (33.8%) zeros | Zeros |
tolls_amount has 10327 (97.1%) zeros | Zeros |
Reproduction
| Analysis started | 2025-12-09 11:34:57.222316 |
|---|---|
| Analysis finished | 2025-12-09 11:35:09.277819 |
| Duration | 12.06 seconds |
| Software version | ydata-profiling vv4.18.0 |
| Download configuration | config.json |
Variables
tpep_pickup_datetime
Date
| Distinct | 10635 |
|---|---|
| Distinct (%) | > 99.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 166.2 KiB |
| Minimum | 2017-01-01 00:08:25 |
|---|---|
| Maximum | 2017-12-31 23:45:30 |
| Invalid dates | 0 |
| Invalid dates (%) | 0.0% |
| Distinct | 10634 |
|---|---|
| Distinct (%) | > 99.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 166.2 KiB |
| Minimum | 2017-01-01 00:17:20 |
|---|---|
| Maximum | 2017-12-31 23:49:24 |
| Invalid dates | 0 |
| Invalid dates (%) | 0.0% |
passenger_count
Real number (ℝ)
| Distinct | 7 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.636244 |
| Minimum | 0 |
|---|---|
| Maximum | 6 |
| Zeros | 9 |
| Zeros (%) | 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 166.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 1 |
| Q3 | 2 |
| 95-th percentile | 5 |
| Maximum | 6 |
| Range | 6 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 1.2574375 |
|---|---|
| Coefficient of variation (CV) | 0.76849022 |
| Kurtosis | 3.8172941 |
| Mean | 1.636244 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 2.1823746 |
| Sum | 17408 |
| Variance | 1.5811491 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 7491 | |
| 2 | 1644 | 15.5% |
| 5 | 538 | 5.1% |
| 3 | 455 | 4.3% |
| 6 | 283 | 2.7% |
| 4 | 219 | 2.1% |
| 0 | 9 | 0.1% |
| Value | Count | Frequency (%) |
| 0 | 9 | 0.1% |
| 1 | 7491 | |
| 2 | 1644 | 15.5% |
| 3 | 455 | 4.3% |
| 4 | 219 | 2.1% |
| 5 | 538 | 5.1% |
| 6 | 283 | 2.7% |
| Value | Count | Frequency (%) |
| 6 | 283 | 2.7% |
| 5 | 538 | 5.1% |
| 4 | 219 | 2.1% |
| 3 | 455 | 4.3% |
| 2 | 1644 | 15.5% |
| 1 | 7491 | |
| 0 | 9 | 0.1% |
trip_distance
Real number (ℝ)
High correlation
| Distinct | 1060 |
|---|---|
| Distinct (%) | 10.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.7145202 |
| Minimum | 0 |
|---|---|
| Maximum | 30.83 |
| Zeros | 44 |
| Zeros (%) | 0.4% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 166.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0.5 |
| Q1 | 1 |
| median | 1.72 |
| Q3 | 3.2 |
| 95-th percentile | 8.96 |
| Maximum | 30.83 |
| Range | 30.83 |
| Interquartile range (IQR) | 2.2 |
Descriptive statistics
| Standard deviation | 2.8447256 |
|---|---|
| Coefficient of variation (CV) | 1.0479663 |
| Kurtosis | 10.93103 |
| Mean | 2.7145202 |
| Median Absolute Deviation (MAD) | 0.88 |
| Skewness | 2.7688079 |
| Sum | 28879.78 |
| Variance | 8.0924636 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 252 | 2.4% |
| 1.1 | 235 | 2.2% |
| 0.8 | 220 | 2.1% |
| 0.9 | 215 | 2.0% |
| 1.2 | 209 | 2.0% |
| 0.7 | 203 | 1.9% |
| 1.3 | 185 | 1.7% |
| 1.4 | 183 | 1.7% |
| 0.6 | 175 | 1.6% |
| 1.5 | 172 | 1.6% |
| Other values (1050) | 8590 |
| Value | Count | Frequency (%) |
| 0 | 44 | |
| 0.01 | 3 | < 0.1% |
| 0.02 | 4 | < 0.1% |
| 0.03 | 2 | < 0.1% |
| 0.04 | 2 | < 0.1% |
| 0.06 | 1 | < 0.1% |
| 0.07 | 2 | < 0.1% |
| 0.08 | 1 | < 0.1% |
| 0.1 | 15 | 0.1% |
| 0.11 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 30.83 | 1 | |
| 27.88 | 1 | |
| 27.34 | 1 | |
| 26.54 | 1 | |
| 25.86 | 1 | |
| 25.8 | 1 | |
| 24.89 | 1 | |
| 24.61 | 1 | |
| 24.1 | 1 | |
| 23.67 | 1 |
RatecodeID
Categorical
High correlation Imbalance
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 602.6 KiB |
| 1 | |
|---|---|
| 4 | 3 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1 |
|---|---|
| 2nd row | 1 |
| 3rd row | 1 |
| 4th row | 1 |
| 5th row | 1 |
Common Values
| Value | Count | Frequency (%) |
| 1 | 10636 | |
| 4 | 3 | < 0.1% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 1 | 10636 | |
| 4 | 3 | < 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 10636 | |
| 4 | 3 | < 0.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 10639 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 1 | 10636 | |
| 4 | 3 | < 0.1% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 10639 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 1 | 10636 | |
| 4 | 3 | < 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 10639 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 1 | 10636 | |
| 4 | 3 | < 0.1% |
PULocationID
Real number (ℝ)
| Distinct | 123 |
|---|---|
| Distinct (%) | 1.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 160.64677 |
| Minimum | 4 |
|---|---|
| Maximum | 265 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 166.2 KiB |
Quantile statistics
| Minimum | 4 |
|---|---|
| 5-th percentile | 48 |
| Q1 | 113 |
| median | 161 |
| Q3 | 231 |
| 95-th percentile | 249 |
| Maximum | 265 |
| Range | 261 |
| Interquartile range (IQR) | 118 |
Descriptive statistics
| Standard deviation | 66.118582 |
|---|---|
| Coefficient of variation (CV) | 0.41157741 |
| Kurtosis | -0.95289156 |
| Mean | 160.64677 |
| Median Absolute Deviation (MAD) | 68 |
| Skewness | -0.19928603 |
| Sum | 1709121 |
| Variance | 4371.6669 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 230 | 410 | 3.9% |
| 48 | 410 | 3.9% |
| 234 | 405 | 3.8% |
| 79 | 401 | 3.8% |
| 162 | 396 | 3.7% |
| 161 | 392 | 3.7% |
| 237 | 365 | 3.4% |
| 186 | 349 | 3.3% |
| 170 | 335 | 3.1% |
| 163 | 313 | 2.9% |
| Other values (113) | 6863 |
| Value | Count | Frequency (%) |
| 4 | 30 | |
| 7 | 17 | 0.2% |
| 12 | 2 | < 0.1% |
| 13 | 74 | |
| 14 | 2 | < 0.1% |
| 17 | 5 | < 0.1% |
| 24 | 22 | 0.2% |
| 25 | 16 | 0.2% |
| 28 | 1 | < 0.1% |
| 29 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 265 | 2 | < 0.1% |
| 264 | 175 | |
| 263 | 166 | |
| 262 | 64 | 0.6% |
| 261 | 51 | 0.5% |
| 260 | 8 | 0.1% |
| 256 | 9 | 0.1% |
| 255 | 23 | 0.2% |
| 249 | 311 | |
| 247 | 1 | < 0.1% |
DOLocationID
Real number (ℝ)
| Distinct | 194 |
|---|---|
| Distinct (%) | 1.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 158.11495 |
| Minimum | 4 |
|---|---|
| Maximum | 265 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 166.2 KiB |
Quantile statistics
| Minimum | 4 |
|---|---|
| 5-th percentile | 41 |
| Q1 | 100 |
| median | 161 |
| Q3 | 233 |
| 95-th percentile | 261 |
| Maximum | 265 |
| Range | 261 |
| Interquartile range (IQR) | 133 |
Descriptive statistics
| Standard deviation | 72.645088 |
|---|---|
| Coefficient of variation (CV) | 0.45944476 |
| Kurtosis | -1.0902426 |
| Mean | 158.11495 |
| Median Absolute Deviation (MAD) | 70 |
| Skewness | -0.25363414 |
| Sum | 1682185 |
| Variance | 5277.3087 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 48 | 357 | 3.4% |
| 236 | 319 | 3.0% |
| 170 | 319 | 3.0% |
| 230 | 306 | 2.9% |
| 79 | 306 | 2.9% |
| 186 | 290 | 2.7% |
| 239 | 279 | 2.6% |
| 142 | 276 | 2.6% |
| 141 | 262 | 2.5% |
| 237 | 249 | 2.3% |
| Other values (184) | 7676 |
| Value | Count | Frequency (%) |
| 4 | 63 | |
| 7 | 62 | |
| 9 | 2 | < 0.1% |
| 10 | 5 | < 0.1% |
| 11 | 1 | < 0.1% |
| 12 | 6 | 0.1% |
| 13 | 86 | |
| 14 | 12 | 0.1% |
| 15 | 3 | < 0.1% |
| 16 | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| 265 | 12 | 0.1% |
| 264 | 145 | |
| 263 | 211 | |
| 262 | 143 | |
| 261 | 38 | 0.4% |
| 260 | 15 | 0.1% |
| 259 | 3 | < 0.1% |
| 257 | 16 | 0.2% |
| 256 | 42 | 0.4% |
| 255 | 66 | 0.6% |
payment_type
Categorical
Imbalance
| Distinct | 4 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 602.6 KiB |
| 1 | |
|---|---|
| 2 | |
| 3 | 56 |
| 4 | 16 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 2 |
|---|---|
| 2nd row | 1 |
| 3rd row | 1 |
| 4th row | 1 |
| 5th row | 1 |
Common Values
| Value | Count | Frequency (%) |
| 1 | 7340 | |
| 2 | 3227 | |
| 3 | 56 | 0.5% |
| 4 | 16 | 0.2% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 1 | 7340 | |
| 2 | 3227 | |
| 3 | 56 | 0.5% |
| 4 | 16 | 0.2% |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 7340 | |
| 2 | 3227 | |
| 3 | 56 | 0.5% |
| 4 | 16 | 0.2% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 10639 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 1 | 7340 | |
| 2 | 3227 | |
| 3 | 56 | 0.5% |
| 4 | 16 | 0.2% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 10639 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 1 | 7340 | |
| 2 | 3227 | |
| 3 | 56 | 0.5% |
| 4 | 16 | 0.2% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 10639 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 1 | 7340 | |
| 2 | 3227 | |
| 3 | 56 | 0.5% |
| 4 | 16 | 0.2% |
fare_amount
Real number (ℝ)
High correlation
| Distinct | 128 |
|---|---|
| Distinct (%) | 1.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 11.923912 |
| Minimum | 2.5 |
|---|---|
| Maximum | 85.5 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 166.2 KiB |
Quantile statistics
| Minimum | 2.5 |
|---|---|
| 5-th percentile | 4.5 |
| Q1 | 6.5 |
| median | 9.5 |
| Q3 | 14.25 |
| 95-th percentile | 29.5 |
| Maximum | 85.5 |
| Range | 83 |
| Interquartile range (IQR) | 7.75 |
Descriptive statistics
| Standard deviation | 8.3527054 |
|---|---|
| Coefficient of variation (CV) | 0.70050042 |
| Kurtosis | 7.4730187 |
| Mean | 11.923912 |
| Median Absolute Deviation (MAD) | 3.5 |
| Skewness | 2.2713603 |
| Sum | 126858.5 |
| Variance | 69.767687 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 6 | 555 | 5.2% |
| 6.5 | 513 | 4.8% |
| 5.5 | 507 | 4.8% |
| 7 | 506 | 4.8% |
| 7.5 | 487 | 4.6% |
| 5 | 470 | 4.4% |
| 8.5 | 456 | 4.3% |
| 8 | 436 | 4.1% |
| 9 | 435 | 4.1% |
| 9.5 | 409 | 3.8% |
| Other values (118) | 5865 |
| Value | Count | Frequency (%) |
| 2.5 | 58 | 0.5% |
| 3 | 47 | 0.4% |
| 3.5 | 147 | 1.4% |
| 4 | 268 | |
| 4.5 | 357 | |
| 5 | 470 | |
| 5.5 | 507 | |
| 6 | 555 | |
| 6.5 | 513 | |
| 7 | 506 |
| Value | Count | Frequency (%) |
| 85.5 | 1 | |
| 80 | 1 | |
| 78 | 1 | |
| 76 | 1 | |
| 73 | 1 | |
| 72.5 | 1 | |
| 70.5 | 1 | |
| 67.5 | 1 | |
| 66 | 2 | |
| 64.5 | 1 |
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0.5 |
|---|---|
| 2nd row | 0.5 |
| 3rd row | 1.0 |
| 4th row | 1.0 |
| 5th row | 1.0 |
Common Values
| Value | Count | Frequency (%) |
| 0.5 | 7086 | |
| 1.0 | 3553 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 0.5 | 7086 | |
| 1.0 | 3553 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 10639 | |
| . | 10639 | |
| 5 | 7086 | |
| 1 | 3553 | 11.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 31917 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 0 | 10639 | |
| . | 10639 | |
| 5 | 7086 | |
| 1 | 3553 | 11.1% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 31917 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 0 | 10639 | |
| . | 10639 | |
| 5 | 7086 | |
| 1 | 3553 | 11.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 31917 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 0 | 10639 | |
| . | 10639 | |
| 5 | 7086 | |
| 1 | 3553 | 11.1% |
mta_tax
Categorical
Constant
| Distinct | 1 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 623.4 KiB |
| 0.5 |
|---|
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0.5 |
|---|---|
| 2nd row | 0.5 |
| 3rd row | 0.5 |
| 4th row | 0.5 |
| 5th row | 0.5 |
Common Values
| Value | Count | Frequency (%) |
| 0.5 | 10639 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 0.5 | 10639 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 10639 | |
| . | 10639 | |
| 5 | 10639 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 31917 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 0 | 10639 | |
| . | 10639 | |
| 5 | 10639 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 31917 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 0 | 10639 | |
| . | 10639 | |
| 5 | 10639 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 31917 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 0 | 10639 | |
| . | 10639 | |
| 5 | 10639 |
tip_amount
Real number (ℝ)
High correlation Zeros
| Distinct | 529 |
|---|---|
| Distinct (%) | 5.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.7493167 |
| Minimum | 0 |
|---|---|
| Maximum | 28 |
| Zeros | 3598 |
| Zeros (%) | 33.8% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 166.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 1.46 |
| Q3 | 2.46 |
| 95-th percentile | 5.46 |
| Maximum | 28 |
| Range | 28 |
| Interquartile range (IQR) | 2.46 |
Descriptive statistics
| Standard deviation | 2.0059724 |
|---|---|
| Coefficient of variation (CV) | 1.1467177 |
| Kurtosis | 10.398365 |
| Mean | 1.7493167 |
| Median Absolute Deviation (MAD) | 1.46 |
| Skewness | 2.2919282 |
| Sum | 18610.98 |
| Variance | 4.0239251 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 3598 | |
| 1 | 708 | 6.7% |
| 2 | 379 | 3.6% |
| 1.5 | 164 | 1.5% |
| 1.66 | 115 | 1.1% |
| 3 | 114 | 1.1% |
| 1.96 | 105 | 1.0% |
| 2.06 | 104 | 1.0% |
| 1.45 | 102 | 1.0% |
| 1.46 | 102 | 1.0% |
| Other values (519) | 5148 |
| Value | Count | Frequency (%) |
| 0 | 3598 | |
| 0.01 | 6 | 0.1% |
| 0.02 | 1 | < 0.1% |
| 0.04 | 1 | < 0.1% |
| 0.08 | 1 | < 0.1% |
| 0.1 | 4 | < 0.1% |
| 0.12 | 1 | < 0.1% |
| 0.2 | 5 | < 0.1% |
| 0.26 | 2 | < 0.1% |
| 0.34 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 28 | 1 | |
| 25 | 1 | |
| 18.56 | 1 | |
| 15.95 | 1 | |
| 15.32 | 1 | |
| 15 | 2 | |
| 14.86 | 1 | |
| 14.84 | 1 | |
| 14.76 | 1 | |
| 14.46 | 1 |
tolls_amount
Real number (ℝ)
Zeros
| Distinct | 15 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.16687001 |
| Minimum | 0 |
|---|---|
| Maximum | 17.28 |
| Zeros | 10327 |
| Zeros (%) | 97.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 166.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 0 |
| Maximum | 17.28 |
| Range | 17.28 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.98181906 |
|---|---|
| Coefficient of variation (CV) | 5.883736 |
| Kurtosis | 44.538191 |
| Mean | 0.16687001 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 6.234667 |
| Sum | 1775.33 |
| Variance | 0.96396868 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 10327 | |
| 5.76 | 220 | 2.1% |
| 5.54 | 71 | 0.7% |
| 2.64 | 6 | 0.1% |
| 2.54 | 5 | < 0.1% |
| 11.52 | 1 | < 0.1% |
| 2.16 | 1 | < 0.1% |
| 8.5 | 1 | < 0.1% |
| 17.28 | 1 | < 0.1% |
| 5.49 | 1 | < 0.1% |
| Other values (5) | 5 | < 0.1% |
| Value | Count | Frequency (%) |
| 0 | 10327 | |
| 2.16 | 1 | < 0.1% |
| 2.54 | 5 | < 0.1% |
| 2.64 | 6 | 0.1% |
| 2.7 | 1 | < 0.1% |
| 5.16 | 1 | < 0.1% |
| 5.49 | 1 | < 0.1% |
| 5.54 | 71 | 0.7% |
| 5.76 | 220 | 2.1% |
| 6.32 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 17.28 | 1 | < 0.1% |
| 16.62 | 1 | < 0.1% |
| 11.52 | 1 | < 0.1% |
| 10.5 | 1 | < 0.1% |
| 8.5 | 1 | < 0.1% |
| 6.32 | 1 | < 0.1% |
| 5.76 | 220 | |
| 5.54 | 71 | 0.7% |
| 5.49 | 1 | < 0.1% |
| 5.16 | 1 | < 0.1% |
improvement_surcharge
Categorical
Constant
| Distinct | 1 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 623.4 KiB |
| 0.3 |
|---|
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0.3 |
|---|---|
| 2nd row | 0.3 |
| 3rd row | 0.3 |
| 4th row | 0.3 |
| 5th row | 0.3 |
Common Values
| Value | Count | Frequency (%) |
| 0.3 | 10639 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 0.3 | 10639 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 10639 | |
| . | 10639 | |
| 3 | 10639 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 31917 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 0 | 10639 | |
| . | 10639 | |
| 3 | 10639 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 31917 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 0 | 10639 | |
| . | 10639 | |
| 3 | 10639 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 31917 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 0 | 10639 | |
| . | 10639 | |
| 3 | 10639 |
total_amount
Real number (ℝ)
High correlation
| Distinct | 880 |
|---|---|
| Distinct (%) | 8.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 15.308827 |
| Minimum | 3.8 |
|---|---|
| Maximum | 111.38 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 166.2 KiB |
Quantile statistics
| Minimum | 3.8 |
|---|---|
| 5-th percentile | 6.3 |
| Q1 | 8.8 |
| median | 12.3 |
| Q3 | 17.8 |
| 95-th percentile | 37.079 |
| Maximum | 111.38 |
| Range | 107.58 |
| Interquartile range (IQR) | 9 |
Descriptive statistics
| Standard deviation | 10.066952 |
|---|---|
| Coefficient of variation (CV) | 0.65759137 |
| Kurtosis | 7.7735122 |
| Mean | 15.308827 |
| Median Absolute Deviation (MAD) | 4 |
| Skewness | 2.3591718 |
| Sum | 162870.61 |
| Variance | 101.34353 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 7.3 | 257 | 2.4% |
| 8.3 | 235 | 2.2% |
| 7.8 | 234 | 2.2% |
| 8.8 | 229 | 2.2% |
| 6.8 | 229 | 2.2% |
| 10.3 | 223 | 2.1% |
| 10.8 | 209 | 2.0% |
| 9.3 | 206 | 1.9% |
| 9.8 | 199 | 1.9% |
| 6.3 | 182 | 1.7% |
| Other values (870) | 8436 |
| Value | Count | Frequency (%) |
| 3.8 | 34 | 0.3% |
| 4.3 | 33 | 0.3% |
| 4.56 | 1 | < 0.1% |
| 4.75 | 1 | < 0.1% |
| 4.8 | 61 | |
| 5 | 2 | < 0.1% |
| 5.15 | 2 | < 0.1% |
| 5.16 | 2 | < 0.1% |
| 5.28 | 2 | < 0.1% |
| 5.3 | 99 |
| Value | Count | Frequency (%) |
| 111.38 | 1 | |
| 92.84 | 1 | |
| 91.9 | 1 | |
| 89.44 | 1 | |
| 89.16 | 1 | |
| 88.56 | 1 | |
| 86.76 | 1 | |
| 85.06 | 1 | |
| 83.56 | 1 | |
| 80.9 | 1 |
duration_minutes
Real number (ℝ)
High correlation Skewed
| Distinct | 2190 |
|---|---|
| Distinct (%) | 20.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 16.324756 |
| Minimum | 0.033333333 |
|---|---|
| Maximum | 1439.55 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 166.2 KiB |
Quantile statistics
| Minimum | 0.033333333 |
|---|---|
| 5-th percentile | 2.9833333 |
| Q1 | 6.55 |
| median | 10.866667 |
| Q3 | 17.316667 |
| 95-th percentile | 31.901667 |
| Maximum | 1439.55 |
| Range | 1439.5167 |
| Interquartile range (IQR) | 10.766667 |
Descriptive statistics
| Standard deviation | 66.211797 |
|---|---|
| Coefficient of variation (CV) | 4.0559133 |
| Kurtosis | 436.62145 |
| Mean | 16.324756 |
| Median Absolute Deviation (MAD) | 4.9666667 |
| Skewness | 20.69863 |
| Sum | 173679.08 |
| Variance | 4384.002 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 5.383333333 | 20 | 0.2% |
| 7.566666667 | 18 | 0.2% |
| 3.783333333 | 18 | 0.2% |
| 7.05 | 18 | 0.2% |
| 9.383333333 | 18 | 0.2% |
| 8.566666667 | 18 | 0.2% |
| 6.1 | 18 | 0.2% |
| 13.36666667 | 17 | 0.2% |
| 5.316666667 | 17 | 0.2% |
| 10.15 | 17 | 0.2% |
| Other values (2180) | 10460 |
| Value | Count | Frequency (%) |
| 0.03333333333 | 4 | |
| 0.05 | 7 | |
| 0.06666666667 | 1 | < 0.1% |
| 0.08333333333 | 4 | |
| 0.1 | 5 | |
| 0.1333333333 | 4 | |
| 0.15 | 3 | |
| 0.1666666667 | 2 | < 0.1% |
| 0.1833333333 | 2 | < 0.1% |
| 0.2 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 1439.55 | 1 | |
| 1439.15 | 1 | |
| 1438.65 | 1 | |
| 1438.55 | 1 | |
| 1438.466667 | 1 | |
| 1438.266667 | 1 | |
| 1436.5 | 1 | |
| 1435.8 | 1 | |
| 1433.983333 | 1 | |
| 1432.916667 | 1 |
Interactions
Correlations
| DOLocationID | PULocationID | RatecodeID | duration_minutes | extra | fare_amount | passenger_count | payment_type | tip_amount | tolls_amount | total_amount | trip_distance | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| DOLocationID | 1.000 | 0.109 | 0.030 | -0.061 | 0.096 | -0.069 | 0.009 | 0.036 | -0.003 | -0.002 | -0.062 | -0.071 |
| PULocationID | 0.109 | 1.000 | 0.038 | -0.058 | 0.101 | -0.070 | -0.006 | 0.021 | -0.014 | -0.047 | -0.062 | -0.070 |
| RatecodeID | 0.030 | 0.038 | 1.000 | 0.000 | 0.000 | 0.678 | 0.000 | 0.000 | 0.000 | 0.061 | 0.453 | 0.525 |
| duration_minutes | -0.061 | -0.058 | 0.000 | 1.000 | 0.014 | 0.964 | 0.020 | 0.000 | 0.382 | 0.231 | 0.943 | 0.839 |
| extra | 0.096 | 0.101 | 0.000 | 0.014 | 1.000 | 0.041 | 0.026 | 0.007 | 0.014 | 0.000 | 0.051 | 0.116 |
| fare_amount | -0.069 | -0.070 | 0.678 | 0.964 | 0.041 | 1.000 | 0.021 | 0.040 | 0.401 | 0.267 | 0.978 | 0.937 |
| passenger_count | 0.009 | -0.006 | 0.000 | 0.020 | 0.026 | 0.021 | 1.000 | 0.025 | -0.025 | 0.011 | 0.013 | 0.028 |
| payment_type | 0.036 | 0.021 | 0.000 | 0.000 | 0.007 | 0.040 | 0.025 | 1.000 | 0.193 | 0.017 | 0.087 | 0.031 |
| tip_amount | -0.003 | -0.014 | 0.000 | 0.382 | 0.014 | 0.401 | -0.025 | 0.193 | 1.000 | 0.176 | 0.547 | 0.387 |
| tolls_amount | -0.002 | -0.047 | 0.061 | 0.231 | 0.000 | 0.267 | 0.011 | 0.017 | 0.176 | 1.000 | 0.280 | 0.271 |
| total_amount | -0.062 | -0.062 | 0.453 | 0.943 | 0.051 | 0.978 | 0.013 | 0.087 | 0.547 | 0.280 | 1.000 | 0.915 |
| trip_distance | -0.071 | -0.070 | 0.525 | 0.839 | 0.116 | 0.937 | 0.028 | 0.031 | 0.387 | 0.271 | 0.915 | 1.000 |
Missing values
Sample
| tpep_pickup_datetime | tpep_dropoff_datetime | passenger_count | trip_distance | RatecodeID | PULocationID | DOLocationID | payment_type | fare_amount | extra | mta_tax | tip_amount | tolls_amount | improvement_surcharge | total_amount | duration_minutes | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 4 | 2017-04-15 23:32:20 | 2017-04-15 23:49:03 | 1 | 4.37 | 1 | 4 | 112 | 2 | 16.5 | 0.5 | 0.5 | 0.00 | 0.0 | 0.3 | 17.80 | 16.716667 |
| 5 | 2017-03-25 20:34:11 | 2017-03-25 20:42:11 | 6 | 2.30 | 1 | 161 | 236 | 1 | 9.0 | 0.5 | 0.5 | 2.06 | 0.0 | 0.3 | 12.36 | 8.000000 |
| 6 | 2017-05-03 19:04:09 | 2017-05-03 20:03:47 | 1 | 12.83 | 1 | 79 | 241 | 1 | 47.5 | 1.0 | 0.5 | 9.86 | 0.0 | 0.3 | 59.16 | 59.633333 |
| 7 | 2017-08-15 17:41:06 | 2017-08-15 18:03:05 | 1 | 2.98 | 1 | 237 | 114 | 1 | 16.0 | 1.0 | 0.5 | 1.78 | 0.0 | 0.3 | 19.58 | 21.983333 |
| 12 | 2017-06-09 19:00:26 | 2017-06-09 19:20:11 | 1 | 3.00 | 1 | 13 | 148 | 1 | 15.0 | 1.0 | 0.5 | 3.35 | 0.0 | 0.3 | 20.15 | 19.750000 |
| 13 | 2017-11-06 23:35:05 | 2017-11-06 23:42:57 | 1 | 2.39 | 1 | 209 | 25 | 1 | 9.5 | 0.5 | 0.5 | 2.16 | 0.0 | 0.3 | 12.96 | 7.866667 |
| 16 | 2017-08-15 19:48:08 | 2017-08-15 20:00:37 | 1 | 3.60 | 1 | 163 | 41 | 1 | 12.5 | 1.0 | 0.5 | 2.85 | 0.0 | 0.3 | 17.15 | 12.483333 |
| 18 | 2017-04-10 18:12:58 | 2017-04-10 18:17:39 | 2 | 0.63 | 1 | 263 | 262 | 2 | 5.0 | 1.0 | 0.5 | 0.00 | 0.0 | 0.3 | 6.80 | 4.683333 |
| 19 | 2017-03-05 04:01:07 | 2017-03-05 04:14:11 | 2 | 2.77 | 1 | 79 | 68 | 1 | 11.5 | 0.5 | 0.5 | 3.20 | 0.0 | 0.3 | 16.00 | 13.066667 |
| 20 | 2017-12-30 23:52:44 | 2017-12-30 23:58:57 | 1 | 1.10 | 1 | 166 | 238 | 2 | 6.5 | 0.5 | 0.5 | 0.00 | 0.0 | 0.3 | 7.80 | 6.216667 |
| tpep_pickup_datetime | tpep_dropoff_datetime | passenger_count | trip_distance | RatecodeID | PULocationID | DOLocationID | payment_type | fare_amount | extra | mta_tax | tip_amount | tolls_amount | improvement_surcharge | total_amount | duration_minutes | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 22681 | 2017-06-09 18:24:49 | 2017-06-09 18:36:15 | 1 | 1.79 | 1 | 234 | 144 | 1 | 9.5 | 1.0 | 0.5 | 1.00 | 0.00 | 0.3 | 12.30 | 11.433333 |
| 22683 | 2017-08-03 17:30:04 | 2017-08-03 17:41:52 | 1 | 1.17 | 1 | 107 | 170 | 1 | 8.5 | 1.0 | 0.5 | 2.06 | 0.00 | 0.3 | 12.36 | 11.800000 |
| 22684 | 2017-08-03 16:36:32 | 2017-08-03 16:46:23 | 2 | 1.20 | 1 | 68 | 50 | 1 | 8.0 | 1.0 | 0.5 | 1.95 | 0.00 | 0.3 | 11.75 | 9.850000 |
| 22685 | 2017-07-05 22:42:46 | 2017-07-05 22:49:29 | 1 | 1.01 | 1 | 144 | 79 | 1 | 6.5 | 0.5 | 0.5 | 1.56 | 0.00 | 0.3 | 9.36 | 6.716667 |
| 22686 | 2017-02-08 18:13:26 | 2017-02-08 19:34:11 | 5 | 10.64 | 1 | 170 | 70 | 1 | 52.0 | 1.0 | 0.5 | 14.84 | 5.54 | 0.3 | 74.18 | 80.750000 |
| 22688 | 2017-08-05 21:23:29 | 2017-08-05 21:26:11 | 3 | 0.44 | 1 | 230 | 163 | 2 | 4.0 | 0.5 | 0.5 | 0.00 | 0.00 | 0.3 | 5.30 | 2.700000 |
| 22691 | 2017-01-06 01:50:14 | 2017-01-06 01:56:47 | 1 | 2.12 | 1 | 170 | 79 | 1 | 8.0 | 0.5 | 0.5 | 0.00 | 0.00 | 0.3 | 9.30 | 6.550000 |
| 22692 | 2017-07-16 03:22:51 | 2017-07-16 03:40:52 | 1 | 5.70 | 1 | 249 | 17 | 1 | 19.0 | 0.5 | 0.5 | 4.05 | 0.00 | 0.3 | 24.35 | 18.016667 |
| 22693 | 2017-08-10 22:20:04 | 2017-08-10 22:29:31 | 1 | 0.89 | 1 | 229 | 170 | 1 | 7.5 | 0.5 | 0.5 | 1.76 | 0.00 | 0.3 | 10.56 | 9.450000 |
| 22694 | 2017-02-24 17:37:23 | 2017-02-24 17:40:39 | 3 | 0.61 | 1 | 48 | 186 | 2 | 4.0 | 1.0 | 0.5 | 0.00 | 0.00 | 0.3 | 5.80 | 3.266667 |